Goto

Collaborating Authors

 kuala lumpur


SeeingSounds: Learning Audio-to-Visual Alignment via Text

Carnemolla, Simone, Pennisi, Matteo, Russo, Chiara, Palazzo, Simone, Giordano, Daniela, Spampinato, Concetto

arXiv.org Artificial Intelligence

We introduce SeeingSounds, a lightweight and modular framework for audio-to-image generation that leverages the interplay between audio, language, and vision-without requiring any paired audio-visual data or training on visual generative models. Rather than treating audio as a substitute for text or relying solely on audio-to-text mappings, our method performs dual alignment: audio is projected into a semantic language space via a frozen language encoder, and, contextually grounded into the visual domain using a vision-language model. This approach, inspired by cognitive neuroscience, reflects the natural cross-modal associations observed in human perception. The model operates on frozen diffusion backbones and trains only lightweight adapters, enabling efficient and scalable learning. Moreover, it supports fine-grained and interpretable control through procedural text prompt generation, where audio transformations (e.g., volume or pitch shifts) translate into descriptive prompts (e.g., "a distant thunder") that guide visual outputs. Extensive experiments across standard benchmarks confirm that SeeingSounds outperforms existing methods in both zero-shot and supervised settings, establishing a new state of the art in controllable audio-to-visual generation.


ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning

Zhu, Tao, Yu, Yinfeng, Wang, Liejun, Sun, Fuchun, Zheng, Wendong

arXiv.org Artificial Intelligence

Diffusion models have demonstrated remarkable performance in speech synthesis, but typically require multi-step sampling, resulting in low inference efficiency. Recent studies address this issue by distilling diffusion models into consistency models, enabling efficient one-step generation. However, these approaches introduce additional training costs and rely heavily on the performance of pre-trained teacher models. In this paper, we propose ECTSpeech, a simple and effective one-step speech synthesis framework that, for the first time, incorporates the Easy Consistency Tuning (ECT) strategy into speech synthesis. By progressively tightening consistency constraints on a pre-trained diffusion model, ECTSpeech achieves high-quality one-step generation while significantly reducing training complexity. In addition, we design a multi-scale gate module (MSGate) to enhance the denoiser's ability to fuse features at different scales. Experimental results on the LJSpeech dataset demonstrate that ECTSpeech achieves audio quality comparable to state-of-the-art methods under single-step sampling, while substantially reducing the model's training cost and complexity.


Gradient Shaping Beyond Clipping: A Functional Perspective on Update Magnitude Control

You, Haochen, Liu, Baojing

arXiv.org Artificial Intelligence

Gradient clipping is widely used to stabilize deep network training, but its formulation as a hard, fixed threshold limits flexibility and ignores gradient distribution dynamics. We propose SPAMP (Statistical Per-layer Adaptive Modulation and Projection), a unified framework that generalizes clipping into smooth, per-layer gradient shaping. SPAMP tracks local gradient statistics, dynamically estimates thresholds, and applies power-based transformations to modulate update magnitudes in a differentiable manner. This perspective recasts clipping and warmup as dual mechanisms for controlling the effective update scale $η_t \|g_t\|$, offering a principled alternative to rigid heuristics. Extensive experiments across image and language tasks demonstrate that SPAMP improves stability, convergence, and robustness over existing methods.


AgentCoMa: A Compositional Benchmark Mixing Commonsense and Mathematical Reasoning in Real-World Scenarios

Alazraki, Lisa, Chen, Lihu, Brassard, Ana, Stacey, Joe, Rahmani, Hossein A., Rei, Marek

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved high accuracy on complex commonsense and mathematical problems that involve the composition of multiple reasoning steps. However, current compositional benchmarks testing these skills tend to focus on either commonsense or math reasoning, whereas LLM agents solving real-world tasks would require a combination of both. In this work, we introduce an Agentic Commonsense and Math benchmark (AgentCoMa), where each compositional task requires a commonsense reasoning step and a math reasoning step. We test it on 61 LLMs of different sizes, model families, and training strategies. We find that LLMs can usually solve both steps in isolation, yet their accuracy drops by ~30% on average when the two are combined. This is a substantially greater performance gap than the one we observe in prior compositional benchmarks that combine multiple steps of the same reasoning type. In contrast, non-expert human annotators can solve the compositional questions and the individual steps in AgentCoMa with similarly high accuracy. Furthermore, we conduct a series of interpretability studies to better understand the performance gap, examining neuron patterns, attention maps and membership inference. Our work underscores a substantial degree of model brittleness in the context of mixed-type compositional reasoning and offers a test bed for future improvement.


Malaysia controls AI chip exports as U.S. targets China smuggling

The Japan Times

Malaysia will now require permits for exports of high-performance U.S. artificial intelligence chips, suggesting the government is seeking to clamp down on potential diversion of the sensitive components to places like China. Effective immediately, individuals and companies must notify Kuala Lumpur at least 30 days prior to exporting or shipping such hardware, Malaysia's trade and industry ministry said Monday. They must inform the agency if they know or "have reasonable grounds" to suspect the items will be misused or used for restricted activities. Malaysia "will not tolerate the misuse of Malaysia's jurisdiction for illicit trading activities," the ministry said. Kuala Lumpur has come under increasing pressure from Washington -- which has effectively banned the sale of advanced AI chips to China since 2022 -- to halt the suspected flow of those parts to China via intermediaries in Malaysia.


Xi arrives in Malaysia with a message: China's a better partner than Trump

Al Jazeera

Kuala Lumpur, Malaysia – China's President Xi Jinping has arrived in Malaysia as part of a Southeast Asian tour which is seen as delivering a personal message that Beijing is a more reliable trading partner than the United States amid a bruising trade war with Washington. Xi arrived in the capital, Kuala Lumpur, on Tuesday evening in what is his first visit to Malaysia since 2013. He flew in from Vietnam where he had signed dozens of trade cooperation agreements in Hanoi on everything from artificial intelligence to rail development. On touching down, Xi said that deepening "high-level strategic cooperation" was good for the common interests of both China and Malaysia, and good for peace, stability and prosperity in the region and the world", according to the official Malaysian news agency Bernama. Xi's three-country tour and his "message" that Beijing is Southeast Asia's better friend than the truculent administration of US President Donald Trump comes as many countries in the 10-member Association of Southeast Asian Nations (ASEAN) bloc are unhappy with their treatment after the US imposed huge tariffs on countries around the world. "This is a very significant visit.


AI-powered chatbot ChatGPT writes poems for Valentine's Day destinations – Travel Weekly

#artificialintelligence

Agoda uses ChatGPT to light spark for top travel destinations this Valentines Day. Tokyo, Bangkok, and Singapore are this year's most popular Valentine's Day getaways in Asia, according to data released by global digital travel platform Agoda. The three Asian capitals are the most searched cities by couples celebrating on February 14th this year. To pay homage to these and other romantic hotspots in the top 14 searched destinations, Agoda commissioned AI-powered chatbot ChatGPT to compose unique poems. "Using ChatGPT to compose poetry may be a gimmick, but the artificial intelligence technology powering the chatbot is beyond promising," said Omri Morgenshtern, Chief Executive Officer of Agoda.


Data Science Job Roles, Salaries and Course Fees in Malaysia

#artificialintelligence

Whether you want to acquire a certification from a reputable university, gain experience as a recent graduate, hone vendor-specific abilities, or demonstrate your knowledge of data science, You're in the right spot! DataMites is Malaysia's top provider of data science courses in Malaysia. DataMites Data Science Certification Programmes in Malaysia are an excellent way to learn about data science. You will be given a comprehensive curriculum and will be able to reach your goal in a disciplined manner. The course is often taught by industry specialists and includes high-quality information. Our Data science certifications in Malaysia allow you not just to gain hard-to-find talents in your target field, but also to authenticate your data science knowledge. Our entire curriculum is internationally recognised thanks to IABAC's accreditation. The data science training in Malaysia contains hands-on projects that will assist you in developing a portfolio to demonstrate your data science skills to potential employers.


Breaking new ground: Sustainability in Malaysia

MIT Technology Review

Technology is central to the country's sustainability agenda. Malaysia's commercial hub, Kuala Lumpur, has rolled out a smart city plan, which includes accelerating digital transformation by focusing on education and promoting cloud technologies and artificial intelligence (AI), among other areas. The Malaysian government has also emphasized technology investment in its Budget 2022, with up to MYR 100 million (US$ 23.7 million) in grants for areas such as smart automation and at least MYR 30 billion (US$ 7 billion) for government-linked companies investing in renewable energy, supply-chain modernization, and 5G infrastructure. In recent years, Kuala Lumpur has also seen an increasing number of "greening" opportunities. For instance, the city governance has employed a smart "City Brain", which uses Alibaba Cloud's computing systems to optimize services like traffic control and even calculate the best routes for emergency services.


Melbourne taps AI to ease traffic congestion

#artificialintelligence

Australian's University of Melbourne has teamed up with a slew of public and private sector organisations to create an artificial intelligence (AI) application that can predict traffic congestion up to three hours ahead, as well as optimise traffic and improve road safety. The AI application, to be hosted on Amazon Web Services (AWS), can also optimise traffic signals for on-road vehicles, freight and public transport such as buses and trams. Majid Sarvi, a transport engineering expert and director at the university's Australian Integrated Multimodal EcoSystem, an initiative to test integrated transport technology on the streets of Melbourne, said the AI application observes the nature of traffic and figures out complex traffic patterns across the network through machine learning. "If we can upscale the application to provide more accurate prediction with machine learning and real-time data, it will soon be possible to substantially reduce delays in hotspots across Melbourne and many locations across the globe," he added. PeakHour Urban Technologies, a Melbourne-based AI specialist with a focus on transportation, developed the application's AI core engine which uses AWS to power its predictive capabilities.